-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[superseded] Different Hashing to avoid Collisions. #11767
Conversation
can you please explain the problem and what you did differently in order to fix it, before you claim that you made it 1000 times faster. |
814a02a
to
0302ad4
Compare
before PR, the hash was |
…ions for small flaots
@narimiran is working on a Murmur3 implementation. |
Obviously we can't have fancy intrinsics in the VM but if it's not too complex CityHash or Daniel Lemire's CLHash can be considered: |
7847592
to
a4eb96d
Compare
I've already made Murmur3 work in the VM. The only remaining problem is JS backend. |
@timotheecour Wow! Nice catch! |
New hash algorithm is shipping with v1, closing. |
superseded by #13418 |
fixes HashSet[uint64] slow insertion depending on values #11764
it turns out all primitive types of size >= 4 bytes (eg int32, pointer, uint, float etc) are affected by the bug, resulting in 100 to 1000 slowdown depending on conditions
for eg for pointer, the 3rd case is 1K slower; float also is similar; this PR fixes that
also fixes another bug that would cause hash collisions for small floats (previous code was using
x+1.0
which leads to 0 for small floats)see test cases here to reproduce:
results: see https://github.com/timotheecour/vitanim/blob/master/testcases/tests/t0129b.nim
note
as I also observed (see test case), the
faster hashing
PR #11203 introduced a 5X slowdown 2.229572/0.424859 in some cases, eg usinguint64
oruint32
and more (for 3rd case, withlet n = 100_000 * 10
), even after my PR #11767 is applied (ie after hash input as as a string): in other words,bytewiseHashing
andmurmur3
are 5X faster compared to the multibyte hash introduced in #11203this is related to what I had observed in #11581 but introduces the new observation that the multibyte extension to the jenkins hash is also affected for smaller inputs than oids, such as
uint64
or evenuint32
. The same conclusion as #11581 follow: we should adopt murmur3 (or at least reconsider implementation of #11203) which always comes out the fastest ; I have provided a pure nim implementation and suggested how to make it work at CT (via vm register callback)[EDIT] that 2nd point won't be observable after latest commit since code now uses
hashData(cast[pointer](unsafeAddr x), T.sizeof)
which for some reason is implemented differently thanhash*[A](aBuf: openArray[A], sPos, ePos: int)
, ie doesn't use multibyte jenkins anymore, but the point remains that multibyte jenkins can still result in 5x slowdown even for small (4B) inputs